253 research outputs found
A Bernstein-Von Mises Theorem for discrete probability distributions
We investigate the asymptotic normality of the posterior distribution in the
discrete setting, when model dimension increases with sample size. We consider
a probability mass function on \mathbbm{N}\setminus \{0\} and a
sequence of truncation levels satisfying Let denote the maximum likelihood estimate of
and let denote the
-dimensional vector which -th coordinate is defined by \sqrt{n}
(\hat{\theta}_n(i)-\theta_0(i)) for We check that under mild
conditions on and on the sequence of prior probabilities on the
-dimensional simplices, after centering and rescaling, the variation
distance between the posterior distribution recentered around
and rescaled by and the -dimensional Gaussian distribution
converges in probability to
This theorem can be used to prove the asymptotic normality of Bayesian
estimators of Shannon and R\'{e}nyi entropies. The proofs are based on
concentration inequalities for centered and non-centered Chi-square (Pearson)
statistics. The latter allow to establish posterior concentration rates with
respect to Fisher distance rather than with respect to the Hellinger distance
as it is commonplace in non-parametric Bayesian statistics.Comment: Published in at http://dx.doi.org/10.1214/08-EJS262 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Learning with Biased Complementary Labels
In this paper, we study the classification problem in which we have access to
easily obtainable surrogate for true labels, namely complementary labels, which
specify classes that observations do \textbf{not} belong to. Let and
be the true and complementary labels, respectively. We first model
the annotation of complementary labels via transition probabilities
, where is the number of
classes. Previous methods implicitly assume that , are identical, which is not true in practice because humans are
biased toward their own experience. For example, as shown in Figure 1, if an
annotator is more familiar with monkeys than prairie dogs when providing
complementary labels for meerkats, she is more likely to employ "monkey" as a
complementary label. We therefore reason that the transition probabilities will
be different. In this paper, we propose a framework that contributes three main
innovations to learning with \textbf{biased} complementary labels: (1) It
estimates transition probabilities with no bias. (2) It provides a general
method to modify traditional loss functions and extends standard deep neural
network classifiers to learn with biased complementary labels. (3) It
theoretically ensures that the classifier learned with complementary labels
converges to the optimal one learned with true labels. Comprehensive
experiments on several benchmark datasets validate the superiority of our
method to current state-of-the-art methods.Comment: ECCV 2018 Ora
Structured Random Matrices
Random matrix theory is a well-developed area of probability theory that has
numerous connections with other areas of mathematics and its applications. Much
of the literature in this area is concerned with matrices that possess many
exact or approximate symmetries, such as matrices with i.i.d. entries, for
which precise analytic results and limit theorems are available. Much less well
understood are matrices that are endowed with an arbitrary structure, such as
sparse Wigner matrices or matrices whose entries possess a given variance
pattern. The challenge in investigating such structured random matrices is to
understand how the given structure of the matrix is reflected in its spectral
properties. This chapter reviews a number of recent results, methods, and open
problems in this direction, with a particular emphasis on sharp spectral norm
inequalities for Gaussian random matrices.Comment: 46 pages; to appear in IMA Volume "Discrete Structures: Analysis and
Applications" (Springer
PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers
The aim of this paper is to generalize the PAC-Bayesian theorems proved by
Catoni in the classification setting to more general problems of statistical
inference. We show how to control the deviations of the risk of randomized
estimators. A particular attention is paid to randomized estimators drawn in a
small neighborhood of classical estimators, whose study leads to control the
risk of the latter. These results allow to bound the risk of very general
estimation procedures, as well as to perform model selection
Convex recovery of a structured signal from independent random linear measurements
This chapter develops a theoretical analysis of the convex programming method
for recovering a structured signal from independent random linear measurements.
This technique delivers bounds for the sampling complexity that are similar
with recent results for standard Gaussian measurements, but the argument
applies to a much wider class of measurement ensembles. To demonstrate the
power of this approach, the paper presents a short analysis of phase retrieval
by trace-norm minimization. The key technical tool is a framework, due to
Mendelson and coauthors, for bounding a nonnegative empirical process.Comment: 18 pages, 1 figure. To appear in "Sampling Theory, a Renaissance."
v2: minor corrections. v3: updated citations and increased emphasis on
Mendelson's contribution
Mirror Descent and Convex Optimization Problems With Non-Smooth Inequality Constraints
We consider the problem of minimization of a convex function on a simple set
with convex non-smooth inequality constraint and describe first-order methods
to solve such problems in different situations: smooth or non-smooth objective
function; convex or strongly convex objective and constraint; deterministic or
randomized information about the objective and constraint. We hope that it is
convenient for a reader to have all the methods for different settings in one
place. Described methods are based on Mirror Descent algorithm and switching
subgradient scheme. One of our focus is to propose, for the listed different
settings, a Mirror Descent with adaptive stepsizes and adaptive stopping rule.
This means that neither stepsize nor stopping rule require to know the
Lipschitz constant of the objective or constraint. We also construct Mirror
Descent for problems with objective function, which is not Lipschitz
continuous, e.g. is a quadratic function. Besides that, we address the problem
of recovering the solution of the dual problem
Sparsity and Incoherence in Compressive Sampling
We consider the problem of reconstructing a sparse signal from a
limited number of linear measurements. Given randomly selected samples of
, where is an orthonormal matrix, we show that minimization
recovers exactly when the number of measurements exceeds where is the number of
nonzero components in , and is the largest entry in properly
normalized: . The smaller ,
the fewer samples needed.
The result holds for ``most'' sparse signals supported on a fixed (but
arbitrary) set . Given , if the sign of for each nonzero entry on
and the observed values of are drawn at random, the signal is
recovered with overwhelming probability. Moreover, there is a sense in which
this is nearly optimal since any method succeeding with the same probability
would require just about this many samples
Some inequalities on generalized entropies
We give several inequalities on generalized entropies involving Tsallis
entropies, using some inequalities obtained by improvements of Young's
inequality. We also give a generalized Han's inequality.Comment: 15 page
- …